Skip to content

issue/334 增加AutoInfinilmProcessor基建#335

Merged
wooway777 merged 5 commits intomainfrom
issue/334
May 9, 2026
Merged

issue/334 增加AutoInfinilmProcessor基建#335
wooway777 merged 5 commits intomainfrom
issue/334

Conversation

@PanZezhong1725
Copy link
Copy Markdown
Collaborator

No description provided.

@PanZezhong1725 PanZezhong1725 changed the title issue/334 add processor infra issue/334 增加AutoInfinilmProcessor基建 Apr 29, 2026
@@ -0,0 +1,34 @@
class InfinilmProcessor:
Copy link
Copy Markdown
Collaborator Author

@PanZezhong1725 PanZezhong1725 Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件是核心修改。

多模态模型引入之后,不同模型有不同的处理输入message的逻辑。
处理过程可抽象为三步:

  1. apply chat template:返回文本,注意这里的文本不能直接encode,而需要调用考虑多模态输入的process
  2. process:传入template好的prompt、所有图片视频等,返回processed_input(包含pytorch张量,hf功能限制导致)
  3. batch:将scheduler output中的所有request的processed_input整合成infinicore tensor的batch(比如加入continuous batching所需的输入)

@PanZezhong1725 PanZezhong1725 marked this pull request as ready for review April 29, 2026 08:12
@PanZezhong1725 PanZezhong1725 requested review from a team, ma-hang and wooway777 April 29, 2026 08:12
@PanZezhong1725
Copy link
Copy Markdown
Collaborator Author

九格7B服务测试结果正确:
image
image

@PanZezhong1725 PanZezhong1725 reopened this May 9, 2026
@PanZezhong1725
Copy link
Copy Markdown
Collaborator Author

test_infer.py:
image

test_benchmark.py:
image

bench.py:
image

Comment thread examples/test_infer.py
enable_graph_compiling=enable_graph,
attention_backend=attn_backend,
kv_cache_dtype=cfg.kv_cache_dtype,
model = LLM(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样改后,离线推理单测的脚本, 也会走到了服务的 调度和cache管理 的流程么

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

离线单测来说,要经过 调度、cache的队列、block分配。

更新pd分离服务后, 调度和 cache管理两部分中都会加上 kv_connecter的逻辑, 以及 forward前后也会有kv_connecter的代码逻辑。

这样对于jiuge.py的来说,经过的代码是不是太多了

wooway777 added a commit that referenced this pull request May 9, 2026
issue/334 增加AutoInfinilmProcessor基建 #335
@wooway777 wooway777 merged commit 1c831b8 into main May 9, 2026
@wooway777 wooway777 deleted the issue/334 branch May 9, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants